The ability to record high-fidelity videos at high acquisition rates is central to the study of fast moving phenomena. The difficulty of imaging fast moving scenes lies in a trade-off between motion blur and underexposure noise: On the one hand, recordings with long exposure times suffer from motion blur effects caused by movements in the recorded scene. On the other hand, the amount of light reaching camera photosensors decreases with exposure times so that short-exposure recordings suffer from underexposure noise. In this paper, we propose to address this trade-off by treating the problem of high-speed imaging as an underexposed image denoising problem. We combine recent advances on underexposed image denoising using deep learning and adapt these methods to the specificity of the high-speed imaging problem. Leveraging large external datasets with a sensor-specific noise model, our method is able to speedup the acquisition rate of a High-Speed Camera over one order of magnitude while maintaining similar image quality.
translated by 谷歌翻译
最近的作品表明,隐式神经表示(INR)具有信号导数的有意义表示的能力。在这项工作中,我们利用该属性来执行视频框架插值(VFI),通过明确限制INR的衍生物以满足光流约束方程。我们仅使用目标视频及其光流,在有限的运动范围内实现了最先进的VFI,而无需从其他培训数据中学习插值操作员。我们进一步表明,限制INR衍生物不仅可以更好地插值中间框架,还可以提高狭窄网络适合观察到的帧的能力,这暗示了潜在的视频压缩和INR优化的应用。
translated by 谷歌翻译
Microswimmers can acquire information on the surrounding fluid by sensing mechanical queues. They can then navigate in response to these signals. We analyse this navigation by combining deep reinforcement learning with direct numerical simulations to resolve the hydrodynamics. We study how local and non-local information can be used to train a swimmer to achieve particular swimming tasks in a non-uniform flow field, in particular a zig-zag shear flow. The swimming tasks are (1) learning how to swim in the vorticity direction, (2) the shear-gradient direction, and (3) the shear flow direction. We find that access to lab frame information on the swimmer's instantaneous orientation is all that is required in order to reach the optimal policy for (1,2). However, information on both the translational and rotational velocities seem to be required to achieve (3). Inspired by biological microorganisms we also consider the case where the swimmers sense local information, i.e. surface hydrodynamic forces, together with a signal direction. This might correspond to gravity or, for micro-organisms with light sensors, a light source. In this case, we show that the swimmer can reach a comparable level of performance as a swimmer with access to lab frame variables. We also analyse the role of different swimming modes, i.e. pusher, puller, and neutral swimmers.
translated by 谷歌翻译
Recently, extensive studies on photonic reinforcement learning to accelerate the process of calculation by exploiting the physical nature of light have been conducted. Previous studies utilized quantum interference of photons to achieve collective decision-making without choice conflicts when solving the competitive multi-armed bandit problem, a fundamental example of reinforcement learning. However, the bandit problem deals with a static environment where the agent's action does not influence the reward probabilities. This study aims to extend the conventional approach to a more general multi-agent reinforcement learning targeting the grid world problem. Unlike the conventional approach, the proposed scheme deals with a dynamic environment where the reward changes because of agents' actions. A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm. This study proposes a novel learning algorithm, discontinuous bandit Q-learning, in view of a potential photonic implementation. Here, state-action pairs in the environment are regarded as slot machines in the context of the bandit problem and an updated amount of Q-value is regarded as the reward of the bandit problem. We perform numerical simulations to validate the effectiveness of the bandit algorithm. In addition, we propose a multi-agent architecture in which agents are indirectly connected through quantum interference of light and quantum principles ensure the conflict-free property of state-action pair selections among agents. We demonstrate that multi-agent reinforcement learning can be accelerated owing to conflict avoidance among multiple agents.
translated by 谷歌翻译
使用三维(3D)图像传感器的智能监视一直在智能城市的背景下引起人们的注意。在智能监控中,实施了3D图像传感器获取的点云数据的对象检测,以检测移动物体(例如车辆和行人)以确保道路上的安全性。但是,由于光检测和范围(LIDAR)单元用作3D图像传感器或3D图像传感器的安装位置,因此点云数据的特征是多元化的。尽管迄今已研究了从点云数据进行对象检测的各种深度学习(DL)模型,但尚无研究考虑如何根据点云数据的功能使用多个DL模型。在这项工作中,我们提出了一个基于功能的模型选择框架,该框架通过使用多种DL方法并利用两种人工技术生成的伪不完整的训练数据来创建各种DL模型:采样和噪声添加。它根据在真实环境中获取的点云数据的功能,为对象检测任务选择最合适的DL模型。为了证明提出的框架的有效性,我们使用从KITTI数据集创建的基准数据集比较了多个DL模型的性能,并比较了通过真实室外实验获得的对象检测的示例结果。根据情况,DL模型之间的检测准确性高达32%,这证实了根据情况选择适当的DL模型的重要性。
translated by 谷歌翻译
集体决策对于最近的信息和通信技术至关重要。在我们以前的研究中,我们在数学上得出了无冲突的联合决策,最佳地满足了玩家的概率偏好概况。但是,关于最佳联合决策方法存在两个问题。首先,随着选择的数量的增加,计算最佳关节选择概率矩阵爆炸的计算成本。其次,要得出最佳的关节选择概率矩阵,所有玩家都必须披露其概率偏好。现在,值得注意的是,不一定需要对关节概率分布的明确计算;集体决策的必要条件是抽样。这项研究研究了几种抽样方法,这些方法会融合到满足玩家偏好的启发式关节选择概率矩阵。我们表明,它们可以大大减少上述计算成本和机密性问题。我们分析了每种采样方法的概率分布,以及所需的计算成本和保密性。特别是,我们通过光子的量子干扰引入了两种无冲突的关节抽样方法。第一个系统允许玩家隐藏自己的选择,同时在玩家具有相同的偏好时几乎完美地满足了玩家的喜好。第二个系统,其物理性质取代了昂贵的计算成本,它也掩盖了他们的选择,因为他们拥有可信赖的第三方。
translated by 谷歌翻译
在目前的工作中,我们表明,公式驱动的监督学习(FDSL)的表现可以匹配甚至超过Imagenet-21K的表现,而无需在视觉预训练期间使用真实的图像,人类和自我选择变压器(VIT)。例如,在ImagEnet-21K上预先训练的VIT-BASE在ImagEnet-1K上进行微调时,在ImagEnet-1K和FDSL上进行微调时显示了81.8%的TOP-1精度,当在相同条件下进行预训练时(图像数量,数量,,图像数量,超参数和时期数)。公式产生的图像避免了隐私/版权问题,标记成本和错误以及真实图像遭受的偏见,因此具有巨大的预训练通用模型的潜力。为了了解合成图像的性能,我们测试了两个假设,即(i)对象轮廓是FDSL数据集中重要的,(ii)创建标签的参数数量增加会影响FDSL预训练的性能改善。为了检验以前的假设,我们构建了一个由简单对象轮廓组合组成的数据集。我们发现该数据集可以匹配分形的性能。对于后一种假设,我们发现增加训练任务的难度通常会导致更好的微调准确性。
translated by 谷歌翻译
A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally, the results of each block are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduce a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences.
translated by 谷歌翻译
本文提出了一种用于端到端现场文本识别的新颖培训方法。端到端的场景文本识别提供高识别精度,尤其是在使用基于变压器的编码器 - 解码器模型时。要培训高度准确的端到端模型,我们需要为目标语言准备一个大型图像到文本配对数据集。但是,很难收集这些数据,特别是对于资源差的语言。为了克服这种困难,我们所提出的方法利用富裕的大型数据集,以资源丰富的语言,如英语,培训资源差的编码器解码器模型。我们的主要思想是建立一个模型,其中编码器反映了多种语言的知识,而解码器专门从事资源差的语言。为此,所提出的方法通过使用组合资源贫乏语言数据集和资源丰富的语言数据集的多语言数据集来预先培训编码器,以学习用于场景文本识别的语言不变知识。所提出的方法还通过使用资源贫乏语言的数据集预先列举解码器,使解码器更适合资源较差的语言。使用小型公共数据集进行日本现场文本识别的实验证明了该方法的有效性。
translated by 谷歌翻译
本文提出了一种用于对话序列标记的新型知识蒸馏方法。对话序列标签是监督的学习任务,估计目标对话文档中每个话语的标签,并且对于许多诸如对话法估计的许多应用是有用的。准确的标签通常通过分层结构化的大型模型来实现,这些大型模型组成的话语级和对话级网络,分别捕获话语内和话语之间的上下文。但是,由于其型号大小,因此无法在资源受限设备上部署此类模型。为了克服这种困难,我们专注于通过蒸馏了大型和高性能教师模型的知识来列举一个小型模型的知识蒸馏。我们的主要思想是蒸馏知识,同时保持教师模型捕获的复杂环境。为此,所提出的方法,等级知识蒸馏,通过蒸馏来列举小型模型,而不是通过培训模型在教师模型中培训的话语水平和对话级环境的知识模拟教师模型在每个级别的输出。对话法案估算和呼叫场景分割的实验证明了该方法的有效性。
translated by 谷歌翻译